Compressing the Digital Library
نویسندگان
چکیده
these two are in tension—data compression saves space, but at the expense of added access time; and indexing methods provide fast access, but usually at the expense of considerable amounts of additional storage space. Indeed, a complete index to a large body of text can be larger than the text itself—after all, it might store the location of every word in the text. The prospect of digital libraries presents the challenge of storing vast amounts of information efficiently and in a way that facilitates rapid search and retrieval. Storage space can be reduced by appropriate compression techniques, and searching can be enabled by constructing a full-text index. But these two requirements are in conflict: the need for decompression increases access time, and the need for an index increases space requirements. This paper resolves the conflict by showing how (a) large bodies of text can be compressed and indexed into less than half the space required by the original text alone, (b) full-text queries (Boolean or ranked) can be answered in small fractions of a second, and (c) documents can be decoded at the rate of approximately one megabyte a second. Moreover, a document database can be compressed and indexed at the rate of several hundred megabytes an hour. This paper shows that it is possible to make compression and indexing work together efficiently. We describe compression methods suited to large amounts of text that allow both random-access decoding of individual documents and fast execution; we show how the index can itself be compressed so that only a small amount of overhead space is required to store it; and we show how the application of compression techniques allows efficient construction of the index in the first instance. The result is a system that can take a large body of text, and convert it to a compressed text and index that together take up less than half the space occupied by the original data. Combining the two techniques incurs little penalty in access speed—in fact, the access speed can even be improved, since there is less data to be read in from slow secondary storage devices. Moreover, the initial indexing and compression processes can be effected on a mid-range workstation at a rate of several hundred megabytes an hour.
منابع مشابه
Development of Quality Performance of National Digital Library with Kano's Model Approach
Background and Aim: The purpose of this study is to determine the quality requirements of the National Digital Library based on the Kano model and categorize users needs into three groups of: Basic, functional and motivational. Methods: This survey was conducted with a qualitative approach. The requirements of the digital library were extracted using two standards: "Digiqual manual" and the "D...
متن کاملارزیابی کتابخانه دیجیتال دانشگاه علوم پزشکی تهران با استانداردهای ساختار کتابخانه دیجیتالی دانشگاهی
Introduction: Spite of many studies conducted on digital libraries, there are a few studies on the evaluation of this type of library. The present study was an attempt to determine similarities and differences between Tehran University of Medical Sciences Digital Library against the Structural Standards of an academic digital library. Methods: This was an observational study in which the dat...
متن کاملکتابخانهی ملی دیجیتال پزشکی ایران(INMDL) : بایدها و نبایدها
Iran National Digital Library of Medicine was launched in 2008 by Shahid Beheshti University of Medical Sciences in order to supply English language scientific resources for the Universities of Medical Sciences throughout the country. The Library could be accessed via www.inlm.org. Given the academic definition for national and digital libraries, it seems that the services and resources offered...
متن کاملUsing Interactive Search Elements in Digital Libraries
Background and Aim: Interaction in a digital library help users locating and accessing information and also assist them in creating knowledge, better perception, problem solving and recognition of dimension of resources. This paper tries to identify and introduce the components and elements that are used in interaction between user and system in search and retrieval of information in digital li...
متن کاملInvestigating the Level of Observing the Evaluation Criteria for User Interface in library services providing to the blind and deaf users in the word
Purpose: Digital library user interfaces has a determining role in desirable performance of this kind of libraries. Digital Library service providers to the blind and deaf users will have their best performance when the users (deaf and blind users) could have a proper interaction with them. This study aims to evaluate and analyze the criteria related to user interface in digital libraries servi...
متن کاملAn Investigation into Digital Library Users' Collaborative Information Seeking (CIS) of Graduate Students of Kharazmi University with an emphasis on two easy and difficult scenarios
Background and Aim: Understanding collaborative information seeking behaviour requires knowing about personal characteristics, differences between users, and the type of interactions occur during a collaborative behaviour. The aim of this study is to investigate dimensions of collaborative information seeking behaviour of graduate students of Kharazmi University when using a digital library bas...
متن کامل